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ABSTRACT 

Selected  statistical  features  of  the 
Age  Exploration  Program  for  F/A-18  aircraft 
are  examined  with  emphasis  upon  sample 
number   and  the  impact  of  inspection  errors 
upon  resulting  reliability  estimates.   The 
identification  of  aircraft  populations 
targeted  by  samples  of  fleet  leader  aircraft 
is  also  discussed. 


SUMMARY 

Implementation  of  the  AGE  Exploration  Program  (AEP) 
for  F/A-18  aircraft  by  the  Naval  Air  Systems  Command  involves 
sampling  fleet  leader  aircraft  emphasizing  inspection  of  se- 
lected structural  components.   Sample  size,  and  the  inter- 
pretation of  sample  results,  are  the  subject  of  this  report. 

When  the  objective  of  sampling  is  reliability  estim- 
ation, one  can,  in  addition  to  single  point  estimates, 
construct  confidence  bounds  for  fleet  reliability.   These 
reflect  the  quality  of  the  e.stimate  in  terms  of  how  big 
a  sample  was  taken.   In  AEP  inspection  to  date,  the  usual 
sampling  result  is  that  no  discrepancies  are  found,  hence 
point  estimates  of  reliability  are  1.0.   The  functional 
relations  and  graphs  developed  in  this  report  permit  one 
to,  for  the  case  of  a  discrepancy-free  sample,  place 
a  lower  bound  on  fleet  reliability  as  a  function  of 
how  many  aircraft  were  inspected. 

During  inspection,  some  discrepancies  may  go  un- 
discovered.  When  this  happens,  sampling  results  over- 
state reliability.   In  this  paper  a  method  is  developed 
to  adjust  sample  size  or  reliability  estimates  to  account 
for  the  chance  of  inspection  error,  and  curves  are 
provided  to  simplify  this  adjustment. 

ii 


iii 
Since  aircraft  sampled  in  the  Age  Exploration  Program 
are  fleet  leaders  in  terms  of  usage,  they  are  not  particularly 
representative  of  the  F/A-18  fleet  that  exists  at  that  point 
in  time.   However,  they  should  be  representative  of  F/A-18 
aircraft  as  those  aircraft  reach  the  same  usage  level  that 
characterized  the  sample.   Careful  identification  of  this 
future  population  increases  future  utilization  of  the  relia- 
bility estimates  from  current  AEP  data. 
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STATISTICAL  ASPECTS  OF  THE  F/A-18 
AGE  EXPLORATION  PROGRAM 


The  Naval  Air  Systems  Command  has  established  the  Age 
Exploration  Program  (AEP)  for  F/A-18  aircraft  using  Relia- 
bility-Centered Maintenance  procedures  in  an  effort  to  reduce 
maintenance  costs  by  specifying  only  maintenance  insuring 
flight  integrity.   Among  other  features  of  this  program, 
fleet  leader  aircraft  are  sampled  on  a  regular  basis,  with 
emphasis  on  inspection  of  selected  structural  components. 
It  is  the  size  of  this  sample  and  the  statistical  inter- 
pretation of  the  resulting  data  that  form  the  subject  of 
this  report. 

Since  a  stated  purpose  of  sampling  in  AEP  is  the 
estimation  of  fleet  rcliabi  li  try ,  this  report  first  discusses 
reliability  estimation,  with  emphasis  on  the  relationship 
between  sample  size  and  the  goodness  of  the  estimate,  when 
the  measure  of  effectiveness  for  the  estimate  is  confidence 
interval  size.   Curves  are  provided  for  determining  the  lower 
95%  bound  on  reliability  when  no  discrepancies  are  found  in 
the  sample. 

The  next  section  of  this  report  considers  the  effect 
of  inspection  error  on  reliability  estimation.   Concepts 
from  signal  detection  theory  are  employed  to  develop 


relationships  which  may  be  used  so  as  to  partially 
compensate  for  these  errors.   Curves  are  provided  which 
permit  adjustment  of  reliability  confidence  bounds  when 
discrepancies  may  be  undiscovered  during  inspection  of  the 
aircraft  component. 

The  relationship  of  sample  and  population  is  examined. 
Aircraft  inspected  under  AEP  are  fleet  leaders  as  identified 
by  several  measures  of  wear  and  tear,  and  usage.   Identifi- 
cation of  a  population  from  which  these  aircraft  may  be 
considered  a  representative  sample  is  important,  since  it 
is  to  this  population  that  the  reliability  estimates  will 
apply.   After  suggesting  how  such  a  population  might  be 
defined,  the  report  concludes  with  a  brief  review  of 
previous  studies  addressing  AEP  sampling. 

A.  Reliability  Estimation  an d  Conf id e nee  Bo unds 

In  sampling  to  estimate  the  proportion  of  a  popu- 
lation's items  that  possess  some  stated  attribute,  the 
standard  approach  is  to  sample  n  items,  count  x  possessing 
the  attribute,  and  then  use  the  sample  proportion  x/n 
as  the  estimate  of  the  unknown  population  proportion.   The 
n  trials  or  observations  are  assumed  to  be  independent  of 
each  other,  and  the  chance  of  the  attribute  being  present 
should  be  the  same  in  each  trial  • 

In  addition  to  the  point  estimate  x/n,  one  can  also 


construct  a  useful  interval  estimate  which  will  place 
a  lower  bound  on  the  unknown  proportion.   This  lower  bound 
is  computed  from  the  data  in  such  a  way  that  there  will  be 
a  95%  chance  that  the  bound  will  indeed  be  below  the  unknown 
proportion.   The  result,  for  example,  might  say  that  we  are 
95%  certain  that  a  component's  reliability  is  greater  than 
0.88,  where  the  lower  bound  0.88  was  computed  from  the  data 
resulting  from  sampling.   The  confidence  interval  method 
has  the  virtue  of  reflecting  the  size  of  the  sample,  and 
thus  the  accuracy  of  the  estimate. 

Applying  these  ideas  to  reliability  estimation  is 
quite  straightforward.   We  are  concerned  with  an  aircraft 
population  of  finite  size,  where  the  unknown  reliability 
is  the  proportion  of  aircraft  in  the  population  that  do 
not  possess  a  discrepancy  at  a  particular  inspection  site 
on  the  aircraft,  such  as  the  stabilator  attach  fitting. 

If  we  sample  (inspect)  n  aircraft  and  find  x  with 
discrepancies  at  the  inspection  site,  then  our  point 
estimate   for  population  reliability  is 

R  =  -£^L_     .  (1) 

n 

Statistical  work  with  this  kind  of  estimate  usually  assumes 
that  the  sample  was   taken  randomly  from  the  population, 
and  that  sampling  was  without  replacement  or  from  an 
infinite  population. 


In  application,  a  difficulty  with  a  point  estimate 
such  as  (1)  is  that  the  estimate  R  itself  does  not  provide 
any  measure  of  its  closeness  to  the  true  reliability  R. 
Finding  no  discrepancies  in  a  sample  of  ten  items  yields 
the  same  estimate  of  reliability  as  finding  no  discrepancies 
in  a  sample  of  100  items.   In  both  cases  the  reliability 
estimate  is  R  =  1.0,  but  clearly  we  have  more  confidence 
in  the  latter.  Simply  knowing  that  bigger  samples  give 
better  estimates  (in  terms  of  accuracy)  does  not  offer 
guidance  regarding  how  big  a  sample  one  ought  to  take. 
To  relate  sample  size  to  the  goodness  of  the  estimate 
requires  a  measure  of  the  effectiveness  of  the  estimate, 
and  this  may  be  found  through  the  application  of  confidence 
intervals  instead  of  point  estimates. 

The  best-known  procedure  for  developing  confidence 

intervals  for  proportions  is  attributed  to  Clopper  and 

2 
Pearson,  and  we  shall  follow  their  approach.    We  seek  a 

95%  lower  bounded  confidence  interval  for  reliability. 

This  means  that  we  wish  to  use  the  data  from  the  sample 

to  construct  a  lower  bound  for  the  unknown  population 

reliability,  and  that  this  lower  bound  should  be  such  that 

we  are  95%  certain  that  it  is  less  than  the  population 

reliability  R.   Thus  from  the  sample  data,  we  wish  to  find 

a  lower  bound  such  that  the  probability  that 

(Lower  Bound  <  R)  is  0.95. 


The  value  of  Lower  Bound  is  to  be  computed  from  the 
results  of  the  sample,  and  we  shall  focus  upon  the  AEP 
experiences  to  date  where  the  sample  contains  no  discrep- 
ancies.   Thus  x  =  0,  and  R  =  1.0.   From  this  sample  result, 
the  lower  bound  is  determined  by  asking  how  low  the 
population  reliability  could  be  while  allowing  a  5%  chance 
of  no  discrepancies  in  the  sample.   This  value  of  reliabil- 
ity  will  be  the  lower  bound. 

For  reliability  R  and  sample  size  n,  the  probability 
of  no  discrepancies  in  the  sample  is  R  .   Accordingly, 
for  a  5%  chance  of  no  discrepancies  at  our  lower  bound, 
we  have  from  the  binomial  distribution 

(Lower  Bound) n  =  1-0.95 


or 


Lower  Bound  =  (1-0.9  5) 1/n  (2) 


as  our  95%  lower  confidence  bound  on  reliability  R  when 
the  sample  result  is  no  discrepancies.   A  similar  derivation 
could  be  made  when  the  result  is  one  discrepancy  in  the 
sample,  two  discrepancies,  and  so  on. 

From  (2)  it  is  clear  that  with  a  discrepancy-free 
sample,  our  lower  bound  on  population  reliability  R 
increases  with  sample  size.   This  is  illustrated  numer- 
ically by  the  values  in  Table  1,  showing  lower  bounds 
associated  with  various  sample  sizes. 


TABLE  1.   Sample  Size  and  95%  Lower 
Confidence  Bounds  on  Reliability  When 
No  Discrepancies  are  found  in  the  Sample 


Sample  Size    Lower  Bound  on  Reliability 

10  0.741 

15  0.819 

20  0.861 

25  0.887 

30  0.905 

100  0.970 

In  application,  we  could  say  that  if  we  took  a 
sample  of  size  25  and  found  no  discrepancies,  we  would 
be  95%  certain  that  population  reliability  was  greater 
than  0.887.   Stated  differently,  we  would  have  95%  confi- 
dence that  no  more  than  13.3%  of  fleet  aircraft  of  this 
age  will  have  the  discrepancy.   A  plot  showing  lower 
bounds  as  a  function  of  sample  size  for  the  no-discrepancy 
case  is  given  in  Figure  1. 
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FIGURE  1.   Lower  95?.  Confidence  Bounds  for  Fleet 
Reliability   when  no  Discrepencies  are  found 
in  the  Sample. 


B.  Effects  of  Inspection  Errors  on  Reliability  Estimation 

The  foregoing  discussion  of  point  estimates  and 
lower  confidence  bounds  for  reliability  tacitly  assumed 
that  each  observation  was  correct,  in  the  sense  that  the 
determination  that  an  item  did  or  did  not  possess  a 
discrepancy  was  without  error.   The  body  of  literature 
on  inspection  errors  in  non-destructive  inspection  is  a 
growing  one,  and  there  seems  to  be  increasing   concern 
that  the  assumption  of  error-free  performance  on  the   part 
of  inspectors,  inspection  hardware,  and  inspection  pro- 
cedures is  questionable.  '  '  '     In  this  section  we 
shall  discuss  the  impact  of  errors  on  reliability  estimates, 
and  develop  a  way  of  adjusting  the  estimate  to  partially 
compensate  for  errors  in  data. 

In  a  trial  to  determine  whether   an  attribute 
is  present,  two  kinds  of  errors  are  possible.   The 
observation  may  be  that  the  attribute  is  present  when  in 
fact  it  is  not,  or,  the  observation  may  be  that  the 
attribute  is  not  present  when  in  fact  it  is.   Error 
performance  on  the  part  of  the  inspection  process  may 

be  expressed  for  our  reliability  estimation  case  in  the 

7 
signal  detection  theory  manner  by  two  measures: 

p,   as  the  probability  of  a  correct  detection 
of  a  discrepancy,  i.e.,  the  inspection 
concludes  that  a  discrepancy  is  present 
given  there  truly  is  a  discrepancy,  and 


pf   as  the  probability  of  a  false  alarm,  i.e., 
the  inspection  concludes  that  a  discrepancy 
is  present  when  in  fact  there  is  none. 


Using  these  two  measures  of  detection  performance, 
error-free  inspection  is  described  by 


and 


Pd  -  1.0 


Pfa  "  ° 


Suppose  a  population  of  N  items  contained  A  items 
with  discrepancies  and  thus  N-A  good  items,  so  that  the 
population's  true  reliability  would  be 


R-  JLz_* 


N 

If  we  do  100?;  inspection  ( inspect  every  item  in  the 
population) ,  we  will  on  the  average  recognize  a  pro- 
portion p,  of  the  A  items  with  discrepancies.   Additionally, 

we  will  on  the  average  declare  a  proportion  pr      of  the 

i  a 

good  items  to  have  discrepancies.   In  total,  then, 
our  average  count  of  items  with  discrepancies  would  be 

PdA  +  Pfa(N-A> 
From  this,  our  statement  of  observed  reliability  after 


10 


100%  inspection  would  be 


N  -  (p,A  +  p   (N-A)  ) 

R      =  2_       ££_ 

obs 


N 


With  some  direct  algebra,  we  have 


R  ,   =   1  -  p ,  (1-R)  -  p.  R    , 
obs         ^d         £  f  a 


or 


R  ,   =   1  -  p ,  +  R(p,  -  p.  )    .  (3) 

obs         *a      ^d    *ia 


Thus  from  (3)  we  see  that  the  average  value  of 
observed  reliability  in  100%  inspection  is  a  linear 
function  of  the  true  reliability  R.   An  example  of  the 
relative  importance  of  the  two  kinds  of  inspection  errors 
is  shown  in  Table  2,  for  inspection  error  performance  of 
the  order  of  p,  =  0.9,  and  p^   =  0.1. 

TABLE  2.   Examples  of  the  Impact  of  Inspection 
Errors  on  Expected  Observed  Reliability  in 
100%  Inspection. 


Expected  Ob ser v e d  Reliab ility 


True  Reliabi 

lity 

pd=0.9 
Pfa=° 

pd=1.0 

Pfa^0'1 

pd=0.9 

Pfa«o.i 

1.00 

1.000 

0.900 

0.900 

0.  95 

0.955 

0.855 

0.860 

0.90 

0.910 

0.  810 

0.820 

0.  85 

0.865 

0.765 

0.  780 

0.  80 

0.  820 

0.  720 

0.740 
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Returning  to  the  relationship  (3),  if  we  solve  it 
for  actual  reliability  R,  we  have 


p ,  -  (1  -  R  .  ) 
R   =  -Id 2PJL_  m  (4) 


pd  "  Pfa 


It  is  important  at  this  time  to  again  emphasize 

that  R  ,   is  an  average  or  expected  value.   When  errors 
obs        -*— 

are  possible   (P^<(  1*0   or   Pf  >  0)  »  doing  100%  inspection 
on  the  same  population  several  times  would  probably  yield 
a  different  reliability  value  each  time.   Equation  (3) 
refers  to  the  average  result,  and  it  is  this  average  or 
expected  value  that  is  the  argument   in  (4). 

Returning  to  the  effects  of  inspection  errors  on 
sample  results,  it  is  tempting  to  use  the  function  (4) 
as  a  way  of  adjusting  sample  reliability  results  R 
to  account  for  possible  errors.   If  we  sample  n  items 
from  the  population,  count  x  with  discrepancies,  and 
compute  reliability  estimate   R  =  (n-x)/n  ,  we  might 
improve  the  estimate  by  adjusting  it  for  inspection 
errors  via 

^  Pd  -  (1  -  R) 

R      =   .  (5) 

Pd  "  Pfa 
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Note  that  this  requires  prior  estimates  of  p,  and  p_ 
if  one  wishes  to  adjust  the  sample  reliability  estimate 
to  account  for  possible  inspection  errors. 

While  a  seemingly  reasonable  format  to  "improve" 
estimates,  application  of  (5)    can  lead  to  values  for 
adjusted  reliability  R  ,  .  which  are  negative,  or  which 
are  greater  than  1.0.   This  is  because  we  have  replaced  the 
mean  or  average  value  of  observed  reliability  in  (4)  by 
our  direct  reliability  estimate  R,  which  is  a  random 
variable.   In  small  samples  from  the  same  population, 
R  could  be  very  large,  or  very  small.   We  can  generally 
say  that  our  adjusted  reliability  estimate  will  be  in 
the  range 

0     C    R    ,  .     4x     1.0 

adj 


when 


(i  -  Pd><1U  (i  -  Pfa) 


A  case  of  interest  in  the  Age  Exploration  Program 

is  that  where  p_   is  presumed  to  be  small  or  negligible 

i  a 

because  discrepancies  discovered  by  one  inspection  method 
are  "confirmed"  by  a  different  inspection  method.   If  we 
assume  pf   =  0,  then  with  an  estimate  of  discrepancy 
detection  probability  p  ,  we  would  from  (5)  adjust  our 
reliability  estimate  by 
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$„,.   -   1  -   ^^    .  (6, 

adi 

Pd 

Numerical  examples  for  various  p.'s   are  shown  in  Table  3, 
where  we  can  see  the  magnitude  of  adjustment  or  correction 
of  reliability  estimates  that  would  occur  when  we  feel 
that  discrepancy  detection  is  imperfect. 


TABLE  3.   Reliability  Point  Estimates 

Adjusted  for  Discrepancy  Detection 

Probabilities  p,,  where  pc   =0 

Td         *ra 

Reliability 

Estimate  ^ 

from  Sample       Adjusted  Estimate  R  ,. 

R      Pd=0,9   Pd=0'8   Pd=0*7   Pd=°*6   Pd^°*5 


0.5  0.44  0.37  0.29  0.17      0 

0.6  0.55  0.50  0.43  0.33  0.20 

0.7  0.66  0.62  0.57  0.50  0.40 

0.8  0.77  0.75  0.71  0.67  0.60 

0.9  0. 89  0. 87  0.86  0.83  0.80 

1.0  1.00  1.00  1.00  1.00  1.00 

The  same  adjustment  can  be  made  to  our  estimate  of 
reliability  using  confidence  intervals.  Figure  2  shows 
the  lower  95%  confidence  bounds  on  reliability  adjusted  for 
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various  values  of  discrepancy  detection   probabilities  p . , 
for  the  case  where  no  discrepancies  were  found  in  the  sample 
Thus  if  we  felt  that  the  chance  of  finding  a  discrepancy 
in  inspection  was  p,  =  0.8  and  had  found  no  discrepancies 
in  a  sample  of  size  30,   we  might  state  with  95%  certainty 
that  the  population  reliability  was  greater  than  0.88. 
In  other  words,  we  have  95%  confidence  that  no  more  than 
12%  of  fleet  aircraft  at  this  age  will  have  the  discrepancy. 

Using  Figure  2  it  is  possible,  of  course,  to  make 
a  reliability  estimate  before  the  entire  sample  of  30  is 
inspected.   After  the  first  ten  aircraft  were  inspected 
our  lower  bound  at  p   =  0.8  would  be  0.68  for  reliability. 
This  estimate  and  the  later  one  at  n=30  are,  of  course, 
not  independent. 

Functionally,  the  curves  in  Figure  2  show 

Lower  Bound  ,  .   =   1  -  — — -  ( l    "  °-95) .     (7) 

adi 

pd 

Figures  3  and  4  provide  the  same  information  as 
Figure  2  for  confidence  bounds  of  90%,  and  99%,  respectively 


15 


w 

n 
c 
o 
o 

CQ 

4J 
C 

0) 

u 
u 

CD 


6'0 


8*0 


Z/0 


9"0 


spunog    aouspyjuoo   JOMoq    po^snCpv 


O 
CO 


O 
CD 


O 


CD 
N 

00 

_CD 
Q_ 

£ 

o 
on 


o 

CN 


S"0 


16 


en 

c 
o 
o 

CO 

4-1 
C 

Q) 
U 

)-l 
<D 

o 


6*0 


8'0 


U) 

cu 

r^- 

■H 

■p 

4-1 

•H 

■H 

r-l 

■— 1 

■H 

CU 

•H 

£) 

x; 

£) 

l3 

4-> 

(0 

•H 

XI 

r-i 

c 

o 

0) 

•H 

Vj 

« 

TJ 

a- 

M 

c 

c 

O 

3 

o 

iw 

0 

■H 

Ll, 

4J 

CO 

CJ 

T) 

CU 

CU 

o 

c 

M 

+J 

• 

3 

<0 

(3) 

rH 

0 

Q 

jj 

lo 

17 

cu 

>. 

C 

cu 

•r-l 

U 

m 

o 

u 

c 

c 

c 

to 

v 

cu 

ft) 

a 

<T> 

T3 

Oj 

cu 

• 

•H 

a; 

m 

O 

VI— | 

-j 

u 

C 

u 

U) 

a. 

0 

<n 

H 

00 

U 

■H 

a 

. 

U 

o 

oV° 

n 

o 

0 

o 

> 

a> 

*  - 

4-i 

r- 

Vj 

c 

TJ 

o 

cu 

cu 

CU 

s 

-C 

+J 

» 

o 

■* 

01 

\0 

►j 

3 

• 

-p 

■r~i 

o 

4-1 

TS 

• 

tO 

<C 

- 

m 

i-l 

in 

u 

«. 

. 

3 

u 

■H 

CU 
rH 

o 

D 

< 

a 

n 

U 

e 

H 

'M 

tO 

TJ 

U, 

0 

CO 

a 

L'Q 


spunog   aouspijuoo    jo«o"[    po^snCpv 


O 
00 


O 


o 


(D 
N 

CO 

E 

D 
00 


o 

CM 


9"0 


17 


C 
P 

o 

CQ 

-P 
C 
OJ 

u 

u 

a, 


'/) 

0) 

>, 

■H 

4-) 

4J 

.-( 

•H 

i—( 

r  1 

H 

u 

r-f 

Q 

a 

4-> 

S 

H 

Q 

■H 

c 

0 

:> 

■H 

h 

« 

TJ 

Oi 

M 

C 

c 

0 

3 

0 

U-l 

o 

H 

Pn 

4-> 

w 

u 

t3 

0J 

<u 

O 

C 

■~l 

4-1 

« 

3 

m 

0) 

r- 1 

0 

Cj 

PQ 

Ul 

TS 

OJ 

> 

C 

tu 

■H 

u 

13 

U 

u 

c 

c 

c 

n3 

•• 

'J 

flj 

a 

7> 

T3 

a, 

D 

• 

•H 

o 

1-1 

O 

U-l 

i-i 

u 

c 

u 

10 

» 

0 

X 

-1 

00 

u 

•H 

.J 

• 

U 

o 

o\° 

Vj 

n 

0 

0 

•* 

T> 

z: 

»J — 1 

r- 

J-l 

c 

TD 

O 

■J 

(U 

<U 

'< 

c 

P 

~ 

-) 

■•£ 

t/1 

\D 

.-J 

3 

• 

4J 

r 

o 

<4-l 

TJ 

. 

ti 

< 

« 

^r 

M 

in 

u 

» 

• 

S 

V4 

•H 

0) 

O 

D 

< 

Oj 

II 

O 

ri 

H 

U-l 

13 

T3 

tu 

0 

(0 

„. 

o 

00 


o 


O 


c> 

N 
CO 

jd 

E 

o 

C/) 


o 


8"0  9"0  V'O 

spunoe    Bouspxjuoo   jaMoq    pa^snCpv 


18 

C.   Accounting  for  Finite  Populations 

The  foregoing  work  assumes  that  our  samples  come  from 
populations  of  infinite  size,  or  from  sampling  with 
replacement.   This  was  inherent  in  our  tacit  use  of  the 
binomial  probability  distribution.   In  sampling  in  the 
Age  Exploration  Program,  however,  populations  will  be 
finite  in  size,  and  sampling  is  without  replacement. 

When  populations  are  finite  the  correct  probability 
distribution  for  the  number  x  possessing  the  attribute 
out  of  a  sample  of  size  n   is  the  hypergeometric  distri- 
bution; this  would  have  involved  the  use  of  population 
size  in  our  calculations.   It  has  been  frequently  demon- 
strated, however,  that  when  the  sample  size  is  less  than 
10?-,  of  the  population  size,  the  hypergoornctric  is  well 
approximated  by  the  binomial  distribution. 

Where  the  sample  size  exceeds  10%  of  the  population, 
the  lower  bound  value  for  reliability  as  computed  earlier 
in  this  paper  would  understate  the  true  value,  and  thus  the 
error  would  be  on  the  conservative  side.  For  example,  with  a 
sample  of  30  from  a  population  of  300,  the  lower  bound  from 
the  binomial  is  0.9050,  while  the  hypergeometric  value  for 
the  lower  bound  is  0.9096.  For  aircraft  populations  of  size 
20,  30,  40,  50,  and  100,  sample  size  curves  from  the  hyper- 
geometric distribution  are  given  in  the  Appendix  to  this 
report. 
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D.  Characterizing  the  Sample 

Because  they  consist  of  fleet  leader  aircraft,  the 
samples  taken  and  inspected  in  the  Age  Exploration  Program 
are  not  representative  of  the  entire  fleet  of  F/A  18 
aircraft  that  exists  at  the  time  the  sample  is  taken. 
Accordingly,  it  is  necessary  to  identify  or  characterize 
the  population  for  which  reliability  is  being  estimated, 
and  thus  for  which  the  sample  should  be  representative. 

Aircraft  which  are  chosen  to  be  in  the  sample  are 
selected  on  the  basis  of  age  or  usage,  as  defined  by 
one  or  more  measures.   Two  examples  of  these  measures 
are  cumulative  arrestments,  and  the  current  value  of  the 
wing  root  fatigue  index.   The  reliability  estimated 
from  the  sample  should  be  applicable  to  aircraft  when 
they  reach  the  age  range  represented  in  the  sample. 
Such  a  population  does  not  exist  at  a  point  in  time, 
indeed,  some  of  the  aircraft  addressed  may  not  have  been 
built  yet. 

The  sample  in  AEP  is  not  a  random  one.  (A  random 
sample  is  one  taken  in  such  a  way  that  each  element  of 
the  population  has  an  equal  chance  of  being  in  the  sample.) 
For  our  purposes,  however,  we  will  assume  that  the  aircraft 
inspected  are  a  representative  sample  of  F/A  18  aircraft 
in  the  age  range  characterizing  the  sample.   The  practice 
of  using  a  sample  of  today's  items  to  make  statistical 
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inferences  about  future  similar  items  is  widely  followed 
in  agricultural,  biological,  medical,  and  even  military, 
experimental  work. 

E.  Defining  the  Population  for  which  Reliability  is 
Estimated 

Suppose  only  one  measure  of  aircraft  age  is  used  to 
describe  the  1987  AEP  sample,  and  for  discussion  purposes, 
suppose  that  measure  is  wing  root  fatigue  index.   The 
sample  then  can  be  characterized  as  having  wing  root 
fatigue  index  values  between  F,  and  F  ,  and  it  seems 
reasonable  that  our  reliability  estimate  would  then  be 
applicable  to  a  population  of  aircraft  which  also  have 
wing  root  fatigue  index  values  between  F   and  F?.      At  some 
time  in  their  lives,  most  fleet  aircraft  may,  as  they  age, 
be    members  of  this  population.   It  is  when  they  are  at 
that  "age"  that  the  reliability  estimate  will  be  applicable 
to  them. 

F.  Other  Studies  Seeking  Sample  Size 

This  report  has  treated  the  purpose  of  AEP  inspection 
as  estimation  of  reliability,  and  the  work  has  centered 
upon  relating  the  quality  of  such  estimates  to  the 
number  of  aircraft  sampled.   Using  the  goodness  of  the 
estimate  as  the  measure  of  effectiveness,  procedures  were 
developed  for  determining  sample  size,  and  also  for  the 
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inclusion  of  inspection  error  in  finding  final  sample 
size  and  reliability  estimate. 

In  the  past,  other  measures  of  effectiveness  have 
been  used  to  propose  sampling  procedures  and  sample 
sizes  for  aircraft  maintenance.   These  are  briefly 
described  and  contrasted  below. 

MCAIR.  In  their  1983  report  from  McDonnell  Aircraft 
Company,  Smith  and  Swanson  proposed  an  initial  sample  of 

g 

size  22  for  AEP.    This  satisfied  their  criterion  that  if 
10%  of  aircraft  have  discrepancies,  there  should  be  a  chance 
of  0.9  that  the  sample  will  include  one  or  more  aircraft 
with  discrepancies.   Use  of  values  other  than  10°,  and  0.9 
would  have  yielded  different  sample  sizes.   Their  criterion 
assumes  that  a  representative  sample  has  come  from  an 
aircraft  population  having  10%  with  discrepancies.   Since 
those  in  the  sample  are  to  be  the  most  severely  used 
aircraft,  it  is  clear  that  the  sample  is  not  representative 
of  the  group  of  450  aircraft  to  which  it  was  restricted, 
but  of  a  population  of  aircraft  with  similar  usage. 
Applied  to  reliability  estimation  (assuming  p  =0.7), 
a  sample  of  size  22  with  no  discrepancies  found  would 
give  us  9  5%  certainty  that  the  reliability  w.is  greater 
than  0.82,  in  a  population  of  similar  age  and  use. 
After  this  initial  sample,  they  suqgest  a  sample  from 
each  of  the  two  remaining  sets  of  450  aircraft  employing 
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a  procedure  called  Bayesian.   This  approach  involves 
the  assumption  of  a  specific  probability  distribution 
for  fleet  reliability,  prior  to  the  actual  sampling. 
This  a  priori  distribution  is  then  combined  with  the 
actual  data  from  the  sample  to  produce  an   a  posteriori 
probability  distribution  of  reliability.   Their  report 
does  not  indicate  which  a  priori  distribution  they  use, 
how  it  is  to  be  combined  with  actual  data,  or  properties 
of  the  results. 

USAF.   A  different  inspection  criteria  is  used  by 
the  United  States  Air  Force  in  their  sample-based 

Analytical  Condition  Inspection  (ACI)  Program  for  the 

9 
F-15  aircraft.     This  procedure  operates  like  statistical 

hypothesis  tests  applied  as  acceptance  sampling  or  control 

10 
charts.   A  double  sampling  procedure   is  used.     A 

sample  of  size  11  is  taken.   No  action  follows  if  no 

discrepancies  are  found.   If  exactly  ■••ne  discrepancy  is 

found  a  second  sample  of  size  13  is  taken,  and  should  it 

contain  any  discrepancies,  corrective  action  follows. 

Corrective  action  also  ensues  if  more  than  one  discrepancy 

was  found  in  the  first  sample.   The  action,  no  action, 

results  of  this  sampling  procedure  place  it  in  the  realm 

of  statistical  hypothesis  testing  rather  than  estimation. 

For  this  program  an  operating  characteristic  curve  could 
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be  constructed  showing  the  probabilities  of  no  corrective 
action  as  a  function  of  fleet  reliability.    Using  this 
data  to  estimate  reliability  leads  to  problems  because  of 
unequal  sample  sizes,  making  year  to  year  results  not 
comparable  as  point  estimates  if  a  second  sample  is 
periodically  taken.   When  no  discrepancies  are  found, 
the  sample  is  of  size  11  and  we  would  on  the  basis  of  this 
be  95%  certain  that  reliability  is  greater  than  0.66;  this 
assumes  70%  detection  probability  in  inspection.   Sample 
data  will,  of  course,  accumulate  from  year  to  year. 

NARF,  North  Island.   In  the  1982  report  001-82  for 
the  NARF,  North  Island,  J.D.  Hayes  employs  "the  level  of 
confidence  that  the  sample  is  analogous  to  a  population 

which  in  fact  has  at  least  the  specified  reliability". 

12 
This  statement,  which  has  been  discussed  by  Haff   , 

appears  to  be  a  requirement  statement  by  which  a  sample 

size  can  be  deduced.   Although  the  measure  of  sampling 

effectiveness  is  different,  the  equations  which  accompany 

the  procedure  produce  sample  size  curves  which,  with  a 

different  interpretation,  yield  values  similar  to  those 

in  this  report  when  p  =1.0. 

These  three  earlier  studies  may  by  summarized. 

MCAIR  produced  a  sample  size  of  22  to  satisfy   a  stated 

probability  statement.   The  Air  Force  used  a  method 

mirroring  statistical  hypothesis  testing  for  their 
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sampling  procedure,  which  is  directed  toward  corrective 
action  rather  than  estimating  reliability.   The  1982 
NARF  report  employed  probability  statements  to  produce 
expressions  similar  to  those  developed  early  in  this 
report.   None  of  the  three  studies  explicitly  considered 
the  effects  of  inspection  error  on  the  data  or  on  the 
needed  sample  size. 


G.  Concluding  Remarks 

Deciding  on  sample  size  for  any  empirical  activity 
requires  criteria  or  effectiveness  measures  by  which 
the  effects  of  various  alternative  sample  sizes  can  be 
compared  and  judged.   In  this  study  we  have  taken  the 
purpose  of  sampling  to  be  that  of  generating  estimates 
of  reliability,  and  then  used  the  goodness  of  the 
estimate  (as  measured  by  confidence  interval  size)  as 
the  criteria. 

This  permits  the  user  through  the  figures  and  tables 
given  in  this  report  to  evaluate  and  compare  different 
sample  numbers.   If  one  wishes  to  determine  a  single 
number  as  sample  size,  an  acceptable  lower  bound  for  the 
reliability   estimate  must  also  be  given.   If  we  say  that 
with  no  discrepancies  in  the  sample,  we  want  to  be  95%  certain 
that  fleet  reliability  is  greater  than  X,  then  the  required 
sample  size  value  can  readily  be  obtained  from  the  given 
curves . 
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We  have  provided  for  the  adjustment  of  the  above 

values  to  account  for  possible  inspection  errors.   Here, 

Figure  2  on  Page  15  is  probably  most  useful.   The  chances 

of  errors  are  described  by  the  probability  of  detecting 

an  existing  discrepancy.   Often,  in  application,  error 

possibilities  are  not  taken  into  account  because  it  is 

felt  too  difficult  to  estimate  the  detection  probability. 

In  this  regard  it  should  be  pointed  out  that  not  taking 

error  into  account  is  equivalent  to  estimating  p,  =  1.0, 

and  if  one  feels   errors   are  made,    one  should  be  able 

to  formulate  a  better  estimate  of  p,. 

^d 

From  an  estimation  point  of  view,  a  crucial  part 
of   AEP  sampling  is  identifying  the  population  for  which 
the  samples  are  representative.   It  is  hoped  that  the 
work  presented  in  this   report  will  assist  in   identifying 
that  population,  and  will  be  useful  to  those  who  must 
interpret  and  apply  the  results  of  AEP  sampling. 


APPENDIX:  SAMPLE  SIZE 
FOR  FINITE  POPULATIONS 


When  the  population  is  small  so  that  tho  sample  exceeds 
10%  of  the  population,  the  binomial  distribution  should  no 
longer  be  used  as  an  approximation  to  the  hypergeometr ic 
distribution.    In  this  appendix  we  shall  use  the  hyper- 
geometric  distribution  to  provide  fleet  reliability  confi- 
dence  bounds  as  a  function  of  sample  size  for  populations 
of  size  20,  30,  40,  50,  and  10*0  aircraft. 

The  hypergeometric  probability  distribution  is 

Prob(x|  n,m,N)   = '  (8) 

(I) 


where 


N  is  the  number  in  the  population, 

m  is  the  number  in  the  population  that 
possess  the  attribute, 

n  is  the  sample  size,  and 

x  is  the  number  in  the  sample  that 
possess  the  attribute. 


Here,  reliability  is  R  =  m/N 
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Our  case  of  interest  is  when  no  discrepancies  are 
found  in  the  sample.  Here,  x  =  n,  and  the  probability 
of  this  from  (8)  is 


m! (N-m) ! 
Prob(x=n|  n,m,N)   =  .     (9) 

(m-n) !N! 


For  a  95%  lower  confidence  bound,  this  probability  should 
equal  0.05  where  the  bound  is  m/N .   However,  we  cannot  find 
exact  95%  lower  confidence  bounds  solving 

Prob(x=n  I  n,m,N)   =   0.05 

for  bound  =  m/N, since  both  m  and  N  are  integer  valued. 

In  a  population  of  size  N  =  20,  for  example,  m  =  0,1,2, 

.  .  .  ,19,  20.  Thus  the  number  of  possible  reliability 

values  for  the  population  is  finite,  namely  N+l  =  21 

values. 

Partial  numerical  results  from  searching  for  90%  and 

95%  lower  confidence  bounds  for  fleet  reliability  when 

fleet  size  is  N  =  20,  are  shown  in  Table  4.   The  values  in 

the  table  are  confidence  levels  for  various  lower  reliability 

bounds  and  sample  sizes.   For  example,  with  a  sample  of  size 

13  from  a  population  of  20  aircraft,  we  have 

Prob(0.  9  <^  Reliability)  =  0.889, 

and 

Prob(0.85< Reliability)    =    0.969 


TABLE  4.   Examples  of  Probabilities 
Computed  from  the  Hyporgeometric 
Distribution  when  x=n  and  Population 
Size  is  N  =  20. 
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m: 

15 

16 

17 

18 

19 

Sample  Size 

R: 

0.75 

0.80 

0.85 

0.90 

0.95 

6 

.871 

7 

.917 

8 

.949 

.898 

9 

.970 

.932 

10 

.984 

.9  57 

.895 

11 

.992 

.974 

.926 

12 

.986 

.951 

13 

.993 

.969 

.889 

14 

.982 

.921 

15 

.991 

.947 

16 

.968 

17 

.  984 

18 

.995 

.900 

19 

.950 

29 

Thus,  exact  95%  confidence  bounds  cannot  in  most  cases 
be  obtained. 

Figure  5  shows  approximate  95%  lower  confidence  bounds 
for  fleet  reliability  as  a  function  on  sample  size,  for 
populations  of  size  20,  30,  40,  50,  and  100  aircraft.   It 
can  be  seen  that  as  population  size  qrows,  the  number  of 
possible  reliability  values  grows,  and  the  curves  approach 
that  of  Figure  1  in  the  body  of  this  report,  where  the 
binomial  distribution  was  used.   Tt  should  be  pointed 
out  again  that  because  reliability  has  become  a  discrete 
parameter  with  a  finite  number  of  values,  the  plotted  points 
rather  than  the  curves  are  defined.   Also,  visible  irreg- 
ularities are  present  since  exact  95%  confidence  levels 
could  not  be  obtained. 

Plotted  points  in  Figures  6  through  10  adjust  the 
fleet  reliability  bounds  from  Figure  5  to  reflect  the 
possibilities  of  undetected  discrepancies .   Figures  11 
through  15  repeat  Figures  6  through  10,  but  for  90  ? 
confidence  bounds  rather  than  95%. 
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FIGURE  5.  Lower  95*o  Confidence  Bounds  for  Fleet 
Reliability  when  no  Discrepancies  are  found  in 
the  Sample,  for  Populations  of  20,  30,  40,  50, 
and  100  Aircraft. 
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Figure  11.  Population  20  Aircraft.  Lower 
90%   Confidence  Hounds  for  Fleet  Relia- 
bility when  no  Discrepancies  arc  found 
in  the  Sample. 
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Figure  12.  Population  30  Aircraft.  Lower 
90%  Confidence  Bounds  for  Fleet  Relia- 
bility when  no  Discrepancies  are  found 
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Figure  14.  Population  50  Aircraft.  Lower 
90%  Confidence  Bounds  for  Fleet  Relia- 
bility when  no  Discrepancies  are  found 
in  the  Sample. 
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Figure  15.  Population  ] 00  Aircraft.  Lower 
90%  Confidence  Bound:;  for  F'leet  Relia- 
bility when  no  Discrepancies  are  found  in 
the  Sample. 
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